Recognizing Descriptive Wikipedia Categories for Historical Figures
نویسندگان
چکیده
Wikipedia is a useful knowledge source that benefits many applications in language processing and knowledge representation. An important feature of Wikipedia is that of categories. Wikipedia pages are assigned different categories according to their contents as human-annotated labels which can be used in information retrieval, ad hoc search improvements, entity ranking and tag recommendations. However, important pages are usually assigned too many categories, which makes it difficult to recognize the most important ones that give the best descriptions. In this paper, we propose an approach to recognize the most descriptive Wikipedia categories. We observe that historical figures in a precise category presumably are mutually similar and such categorical coherence could be evaluated via texts or Wikipedia links of corresponding members in the category. We rank descriptive level of Wikipedia categories according to their coherence and our ranking yield an overall agreement of 88.27% compared with human wisdom.
منابع مشابه
M ay 2 01 4 1 Interactions of cultures and top people of Wikipedia from ranking of 24 language editions
Wikipedia is a huge global repository of human knowledge, that can be leveraged to investigate interwinements between cultures. With this aim we apply two methods, Markov chains and Google matrix, for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names we obt...
متن کاملInteractions of Cultures and Top People of Wikipedia from Ranking of 24 Language Editions
Wikipedia is a huge global repository of human knowledge that can be leveraged to investigate interwinements between cultures. With this aim, we apply methods of Markov chains and Google matrix for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names, we obtai...
متن کاملN ov 2 01 4 1 Interactions of cultures and top people of Wikipedia from ranking of 24 language editions
Wikipedia is a huge global repository of human knowledge, that can be leveraged to investigate interwinements between cultures. With this aim, we apply methods of Markov chains and Google matrix, for the analysis of the hyperlink networks of 24 Wikipedia language editions, and rank all their articles by PageRank, 2DRank and CheiRank algorithms. Using automatic extraction of people names, we obt...
متن کاملCatching the Red Priest: Using Historical Editions of Encyclopaedia Britannica to Track the Evolution of Reputations
In this paper, we investigate the feasibility of using the chronology of changes in historical editions of Encyclopaedia Britannica (EB) to track the changes in the landscape of cultural knowledge, and specifically, the rise and fall in reputations of historical figures. We describe the dataprocessing pipeline we developed in order to identify the matching articles about historical figures in W...
متن کاملRecognizing Biographical Sections in Wikipedia
Wikipedia is the largest collection of encyclopedic data ever written in the history of humanity. Thanks to its coverage and its availability in machine-readable format, it has become a primary resource for largescale research in historical and cultural studies. In this work, we focus on the subset of pages describing persons, and we investigate the task of recognizing biographical sections fro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1704.07427 شماره
صفحات -
تاریخ انتشار 2017